Goto

Collaborating Authors

 model server


Model Callers for Transforming Predictive and Generative AI Applications

Dalal, Mukesh

arXiv.org Artificial Intelligence

We introduce a novel software abstraction termed "model caller," acting as an intermediary for AI and ML model calling, advocating its transformative utility beyond existing model-serving frameworks. This abstraction offers multiple advantages: enhanced accuracy and reduced latency in model predictions, superior monitoring and observability of models, more streamlined AI system architectures, simplified AI development and management processes, and improved collaboration and accountability across AI/ML/Data Science, software, data, and operations teams. Model callers are valuable for both creators and users of models within both predictive and generative AI applications. Additionally, we have developed and released a prototype Python library for model callers, accessible for installation via pip or for download from GitHub.


Top Tools To Do Machine Learning Serving In Production

#artificialintelligence

Creating a model is one thing, but using that model in production is quite another. The next step after a data scientist completes a model is to deploy it so that it can serve the application. Batch and online model serving are the two main categories. Batch refers to feeding a large amount of data into a model and writing the results to a table, usually as a scheduled operation. You must deploy the model online using an endpoint for applications to send a request to the model and receive a quick response with no latency.


SciAnnotate: A Tool for Integrating Weak Labeling Sources for Sequence Labeling

Liu, Mengyang, Luo, Haozheng, Thong, Leonard, Li, Yinghao, Zhang, Chao, Song, Le

arXiv.org Artificial Intelligence

Weak labeling is a popular weak supervision strategy for Named Entity Recognition (NER) tasks, with the goal of reducing the necessity for hand-crafted annotations. Although there are numerous remarkable annotation tools for NER labeling, the subject of integrating weak labeling sources is still unexplored. We introduce a web-based tool for text annotation called SciAnnotate, which stands for scientific annotation tool. Compared to frequently used text annotation tools, our annotation tool allows for the development of weak labels in addition to providing a manual annotation experience. Our tool provides users with multiple user-friendly interfaces for creating weak labels. SciAnnotate additionally allows users to incorporate their own language models and visualize the output of their model for evaluation. In this study, we take multi-source weak label denoising as an example, we utilized a Bertifying Conditional Hidden Markov Model to denoise the weak label generated by our tool. We also evaluate our annotation tool against the dataset provided by Mysore which contains 230 annotated materials synthesis procedures. The results shows that a 53.7% reduction in annotation time obtained AND a 1.6\% increase in recall using weak label denoising. Online demo is available at https://sciannotate.azurewebsites.net/(demo account can be found in README), but we don't host a model server with it, please check the README in supplementary material for model server usage.


KServe: A Robust and Extensible Cloud Native Model Server

#artificialintelligence

If you are familiar with Kubeflow, you know KFServing as the platform's model server and inference engine. In September last year, the KFServing project has gone through a transformation to become KServe. KServe is now an independent component graduating from the Kubeflow project, apart from the name change. The separation allows KServe to evolve as a separate, cloud native inference engine deployed as a standalone model server. Of course, it will continue to have tight integration with Kubeflow, but they would be treated and maintained as independent open source projects.


Serving ML Models in Production: Common Patterns - KDnuggets

#artificialintelligence

This post is based on Simon Mo's "Patterns of Machine Learning in Production" talk from Ray Summit 2021. Over the past couple years, we've listened to ML practitioners across many different industries to learn and improve the tooling around ML production use cases. Through this, we've seen 4 common patterns of machine learning in production: pipeline, ensemble, business logic, and online learning. In the ML serving space, implementing these patterns typically involves a tradeoff between ease of development and production readiness. Ray Serve was built to support these patterns by being both easy to develop and production ready. It is a scalable and programmable serving framework built on top of Ray to help you scale your microservices and ML models in production.


How PyTorch And AWS Come To The Rescue Of ML Models In Production

#artificialintelligence

"Today, more than 83% of the cloud-based PyTorch projects happen on AWS." The Computer Vision Developer Conference(CVDC) 2020 is a two day event(13-14th Aug) organized by Association of Data Scientists (ADaSci). ADaSci is a premier global professional body of data science & machine learning professionals. Apart from the tech talks covering a wide range of topics, CVDC 2020 also flaunts paper presentations, exhibitions & hackathons. There is also a full day workshop on computer vision that comes with a participation certificate for the attendees.


Serverless inferencing on Kubernetes

Cox, Clive, Sun, Dan, Tarn, Ellis, Singh, Animesh, Kelkar, Rakesh, Goodwin, David

arXiv.org Machine Learning

Organisations are increasingly putting machine learning models into production at scale. The increasing popularity of serverless scale-to-zero paradigms presents an opportunity for deploying machine learning models to help mitigate infrastructure costs when many models may not be in continuous use. We will discuss the KFServing project which builds on the KNative serverless paradigm to provide a serverless machine learning inference solution that allows a consistent and simple interface for data scientists to deploy their models. We will show how it solves the challenges of autoscaling GPU based inference and discuss some of the lessons learnt from using it in production.


Machine Learning and Real-Time Analytics in Apache Kafka Applications

#artificialintelligence

The relationship between Apache Kafka and machine learning (ML) is an interesting one that I've written about quite a bit in How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka and Using Apache Kafka to Drive Cutting-Edge Machine Learning. This blog post addresses a specific part of building a machine learning infrastructure: the deployment of an analytic model in a Kafka application for real-time predictions. Model training and model deployment can be two separate processes. However, you can also use many of the same steps for integration and data preprocessing because you often need to perform the same integration, filter, enrichment, and aggregation of data for model training and model inference. We will discuss and compare two different options for model deployment: model servers with remote procedure calls (RPCs), and natively embedding models into Kafka client applications.


The Rise of the Model Servers

#artificialintelligence

One of the exciting developments in machine learning recently is the rapid emergence of a new class of model servers. Model servers simplify the task of deploying machine learning at scale, the same way app servers simplify the task of delivering a web app or API to end users. The rise of model servers, coupled with increasingly interoperable models, will likely accelerate the adoption of user-facing machine learning in the wild. Although there has been an abundance of open source machine learning software, much of the ecosystem has been focused on model-building. The large Internet companies have built their own model serving infrastructure (such as FBLearner Predictor and Michelangelo), but there have been few easy options for the rest of us.


Scaling Machine Learning from 0 to millions of users, part 1

#artificialintelligence

I suppose most Machine Learning (ML) models are conceived on a whiteboard or a napkin, and born on a laptop. As the fledgling creatures start babbling their first predictions, we're filled with pride and high hopes for their future abilities. Alas, we know deep down in our heart that that not all of them will be successful, far from it. A small number fail us quickly as we build them. Others look promising, and demonstrate some level of predictive power.